import matplotlib.pyplot as plt
import datetime as dt
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import statsmodels.api as sm
import scipy as sp
import yfinance as yf
import os
from dotenv import load_dotenv
from statsmodels.stats.stattools import durbin_watson as dw_test
from statsmodels.graphics.tsaplots import plot_acf
load_dotenv()= os.getenv("FRED_API_KEY")
fred_api_key from fredapi import Fred
= Fred(api_key=fred_api_key) fred
Recall that a classical linear regression model \[ Y_t = \beta_1 +\beta_2 X_t +u_t \]
has an assumption \[ E\left(u_{i} u_{j}\right)=0 \quad i \neq j \] i.e. disturbance terms are independent of each other. However, in many real situations, this assumption will not hold, especially in time series data. Many economic or financial time series data has a ‘momentum’, for instance, if we see a boost in consumer confidence in this month, it is more likely continue the momentum next month unless something unexpected happens.
However if this assumption is violated, we have autocorrelation issue \[ E\left(u_{i} u_{j}\right)\neq 0 \quad i \neq j \] There are many reasons why autocorrelation exists, but for now we don’t care about the exact reason.
If we say a series is positively autocorrelated, it means correlation between successive values are positive, vice verse we call it negatively autorrelated. In financial context, we say that positive autocorrelation is trend following(momentum), the negative autocorrelation is mean reverting.
For instance, weekly return of Microsoft (MSFT) has a negative autocorrelation, which means the possibility of mean-reverting trading strategy.
= dt.datetime(2012, 1, 1)
start = yf.download("MSFT", start=start)["Adj Close"].pct_change()
stock = stock.resample("W").last() stock_weekly
[*********************100%***********************] 1 of 1 completed
stock_weekly.autocorr()
-0.07624732682251825
What about interest rate, let’s try with 10 year treasury yield.
= dt.datetime(1950, 1, 1)
start = yf.download("^TNX", start=start)["Adj Close"].dropna()
tr10_yield tr10_yield.diff().autocorr()
[*********************100%***********************] 1 of 1 completed
0.0636346079626597
= tr10_yield.resample("A").last().diff().dropna()
tr10_yield_annual tr10_yield_annual.autocorr()
/tmp/ipykernel_2576/2352089167.py:1: FutureWarning:
'A' is deprecated and will be removed in a future version, please use 'YE' instead.
-0.18329436633967097
We can see that interest rate has considerably high negative correlation at the annual change rate, but not the daily change rate. alpha
is significance level such that only \(5\%\) of chance an autocorrelation falls outside of blue area.
Does White Noise Have Autocorrelation?
= np.random.randn(1000)
white_noise = plot_acf(
g =100, title="Autocorrelation of White Noise"
white_noise, lags16, 6) ).set_size_inches(
Stocks price are random walk, but the returns are usually white noises, so we can’t use any models to predict returns of stock prices.
Seasonality And Autocorrelation
= yf.download("QG=F", start=start)["Adj Close"].dropna()
electric_gas
# Calculate the differences
= electric_gas.diff()
electric_gas_diff1 = electric_gas.diff(4)
electric_gas_diff2
# Plot the data
=(16, 12), subplots=True, color="orange")
electric_gas.plot(figsize=(16, 12), subplots=True, color="gold")
electric_gas_diff1.plot(figsize=(16, 12), subplots=True) electric_gas_diff2.plot(figsize
[*********************100%***********************] 1 of 1 completed
array([<Axes: xlabel='Date'>], dtype=object)
If you have seasonal data, first order difference won’t work, have to use the fourth order difference.
AR And MA Processes
If the disturbance term is generated by the process \[ u_{t}=\rho u_{t-1}+\varepsilon_{t} \quad-1<\rho<1 \] we call it a First-Order Autoregression Process, denoted \(AR(1)\). And innovation \(\varepsilon_{t}\) satisfies standard features of OLS disturbance term as below, and \(\rho\) is coefficient of autocorrelation at lag \(1\) \[ \begin{aligned} E\left(\varepsilon_{t}\right) &=0 \\ \operatorname{Var}\left(\varepsilon_{t}\right) &=\sigma_{\varepsilon}^{2} \\ \operatorname{Cov}\left(\varepsilon_{t}, \varepsilon_{t+s}\right) &=0 \quad s \neq 0 \end{aligned} \] If \(|\rho|<1\) we say that \(AR(1)\) process is stationary, we’ll get back to this topic in more details.
The main alternative is moving average model which is a weighted sum of current and previous values of \(\varepsilon\)’s, for instance \(MA(2)\) is \[ u_t = \lambda_0 \varepsilon_t+ \lambda_1 \varepsilon_{t-1}+ \lambda_2 \varepsilon_{t-2} \]
Causes Of Autocorrelation
Model Misspecification
= pd.read_excel(
df "../dataset/econometrics_practice_data.xlsx", sheet_name="Mortality_NI_WB_2019"
)
df.head()= df[df["Series_Name"] == "mortality_rate_per_1000"]
df_mort = df[df["Series_Name"] == "NI_perCap_2010constUSD"] df_NI
= sp.stats.linregress(df_NI[2019], df_mort[2019])
reg_results = reg_results[1] + reg_results[0] * df_NI[2019]
Y_fit = np.asarray(df_mort[2019]) - np.asarray(Y_fit) residuals
= plt.subplots(nrows=2, ncols=1, figsize=(18, 22))
fig, ax 0].scatter(df_NI[2019], df_mort[2019])
ax[0].plot(df_NI[2019], Y_fit, color="tomato", lw=3, zorder=0)
ax[0].set_xlabel("National Income per capita 2019 (Unit: 2010 const USD)")
ax[0].set_title("130 Countries' Infants' Mortality and National Income")
ax[0].set_ylabel("Mortality Rate of 1000 birth")
ax[0].set_ylim([0, 90])
ax[0].grid()
ax[
1].scatter(df_NI[2019], residuals)
ax[1].set_ylim([-100, 100])
ax[1].grid()
ax[1].set_title("Residual Plot")
ax[ plt.show()
Omitted Variable Bias
Be aware low \(dw\) test might signal the omission of variable, but not vice versa, we can demonstrate with Monte Carlo simulation.
= 100
N = 3, 4, 5
beta1, beta2, beta3 = np.arange(N)
X2 = [], [], []
dw_array_omit_X2, dw_array_omit_X3, dw_array for i in range(10000):
= np.random.randn(N)
u = np.random.rand(N)
X3 = beta1 + beta2 * X2 + beta3 * X3 + u
Y = pd.DataFrame([Y, X2, X3]).T
df = ["Y", "X2", "X3"]
df.columns = smf.ols(formula="Y ~ X2", data=df)
model = model.fit()
results
dw_array_omit_X3.append(dw_test(results.resid))
= smf.ols(formula="Y ~ X3", data=df)
model = model.fit()
results
dw_array_omit_X2.append(dw_test(results.resid))
= smf.ols(formula="Y ~ X2 + X3", data=df)
model = model.fit()
results dw_array.append(dw_test(results.resid))
= plt.subplots(nrows=1, ncols=3, figsize=(18, 8))
fig, ax 0].hist(dw_array_omit_X3, bins=100)
ax[0].set_title("Durbin Watson Statistic, Omitted $X_3$")
ax[1].hist(dw_array_omit_X2, bins=100)
ax[1].set_title("Durbin Watson Statistic, Omitted $X_2$")
ax[2].hist(dw_array, bins=100)
ax[2].set_title("Durbin Watson Statistic, Correct Specification ")
ax[ plt.show()
OLS Estimation With Presence Of Autocorrelation
As in the case of heteroscedasticity, in the presence of autocorrelation the OLS estimatorsare still linear unbiased as well as consistent and asymptotically normally distributed, but they are no longer efficient (i.e., minimum variance).
First use a \(AR(1)\) model to generate an autocorrelated series as our disturbance term \[ u_{t}=\rho u_{t-1}+\varepsilon_{t} \] choose any \(\rho\) fits your preference, here I use \(.98\)
= 100
T = 10 * np.random.randn(T)
epsilon = [100]
u = u[0]
u_current = 0.98
rho for i in epsilon:
= rho * u_current + i
u_next
u.append(u_next)= u_next
u_current = u[1:] u
Then assume \(X\) fixed at \(1, 2, 3,..., T\) and define a true relationship as \[ Y_t = 2 + 3 X_t + u_t \]
= np.arange(1, T + 1)
X = 2 + 3 * X + u
Y = 2 + 3 * X
PRF = pd.DataFrame([X, Y]).T
df = ["X", "Y"] df.columns
= smf.ols(formula="Y ~ X", data=df)
model = model.fit()
results print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.927
Model: OLS Adj. R-squared: 0.927
Method: Least Squares F-statistic: 1250.
Date: Sun, 27 Oct 2024 Prob (F-statistic): 1.39e-57
Time: 17:33:25 Log-Likelihood: -410.14
No. Observations: 100 AIC: 824.3
Df Residuals: 98 BIC: 829.5
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 92.9894 2.976 31.246 0.000 87.083 98.895
X 1.8086 0.051 35.349 0.000 1.707 1.910
==============================================================================
Omnibus: 3.375 Durbin-Watson: 0.515
Prob(Omnibus): 0.185 Jarque-Bera (JB): 2.695
Skew: -0.276 Prob(JB): 0.260
Kurtosis: 2.415 Cond. No. 117.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
= plt.subplots(figsize=(18, 8))
fig, ax ="tomato", s=5)
ax.scatter(X, Y, color="Fitted Line", lw=3, zorder=0)
ax.plot(X, results.fittedvalues, label="Population Regression Line", lw=3, zorder=-1)
ax.plot(X, PRF, label
ax.grid()={"size": 18})
ax.legend(prop plt.show()
You would be surprised to see that true relationship (Population Regression Function) deviates much more than fitted line, it is exactly due to autocorrelation, as \(\rho\) getting smaller, the generated dots will be closer to PRF, therefore fitted line will be closer to PRF as well.
The point is that with presence of autocorrelation, the estimation results will vary wildly from true relationship, hence all hypothesis tests are invalid too. This happens very common in time series estimation.
Detection of Autocorrelation
Graphical Method
Though the non-autocorrelation assumption relates to disturbance term \(u_t\), which are unobservable, we can instead visualize the residuals \(e_t\). Very likely residuals can provide useful information when demonstrating a visual examination.
Here is a simulated residuals from example above, though graphical method is subjective and qualitative, you can still draw to a rough conclusion if you see an patter presence in residuals.
Durbin-Watson Test
The standard test for first order autocorrelation is Durbin-Watson Test, it is one of standard output of OLS estimation. It is calculated by the formula \[ dw=\frac{\sum_{t=2}^{T}\left(e_{t}-e_{t-1}\right)^{2}}{\sum_{t=1}^{T} e_{t}^{2}} \]
where \(e_t\) is residual at time \(t\).
As an example, we can reproduce the OLS estimation of disposable income and house price in China.
= pd.read_excel(
df "../dataset/econometrics_practice_data.xlsx", sheet_name="CN_Cities_house_price"
)= smf.ols(formula="house_price ~ salary", data=df)
model = model.fit()
results
from statsmodels.stats.stattools import durbin_watson as dw_test
dw_test(results.resid)
0.9255117521586897
Or we can write our own Durbin-Watson function to verify the results
def durbin_watson_test(series):
= np.sum((series[1:] - series.shift(1).dropna()) ** 2)
nomin = np.sum(series**2)
denom return nomin / denom
durbin_watson_test(results.resid)
0.9255117521586897
But how to interpret this number? Some mathematical manipulations of Durbin-Watson’s statistic will bring some insight \[ dw=\frac{\sum_{t=2}^{T}\left(e_{t}-e_{t-1}\right)^{2}}{\sum_{t=1}^{T} e_{t}^{2}} = \frac{\sum_{t=2}^{T}\left(e^2_{t}-2e_te_{t-1}+e^2_{t-1}\right)}{\sum_{t=1}^{T} e_{t}^{2}} = \frac{\sum_{t=1}^Te_t^2}{\sum_{t=1}^Te_t^2}+\frac{\sum_{t=1}^Te_{t-1}^2}{\sum_{t=1}^Te_t^2}-2\frac{\sum_{t=1}^Te_te_{t-1}}{\sum_{t=1}^Te_t^2} \]
If \(T\rightarrow \infty\), the second term approaches \(1\) infinitely and the third term \(\frac{\sum_{t=1}^Te_te_{t-1}}{\sum_{t=1}^Te_t^2}\) equals \(\rho\), because it is a simple OLS estimator of \(\rho\).
Therefore \(dw \rightarrow 2-2\rho\) given \(T\rightarrow \infty\).
If no autocorrelation, \(\rho=0\), \(dw\) should be \(2\), if \(\rho > .5\), i.e. strong positive autocorrelation, \(dw\) will be close to \(0\), or if \(\rho>-.5\), i.e. strong negative autocorrelation, \(dw\) will be close to \(4\).
Because \(dw\) statistic doesn’t have critical value (due to some technical reason, critical value can’t be calculated.) However, it is possible to know the upper and lower limit of \(dw\), go search Durbin Watson table on Google and click any of image that shows on top, or click here.
In the data of flat price, number of observations is \(25\) and number of independent variables is \(1\), here are upper and lower limit from the table. \[ d_L = 1.055\\ d_U = 1.210 \] Our result \(dw=0.9255\) is lower than lower limit, which suggests a rejection of null hypothesis \(\rho=0\), why? Because it’s too far away from \(2\), if \(dw\) is larger than upper limit, we fail to reject null hypothesis. If the \(dw\) drops between lower and upper limit, just forget about it, no decisive conclusion.
However, keep in mind that autocorrelation can also be attributed to model specification errors and we almost only care about autocorrelation issue in time series rather than cross sectional data.
Here is an example of exchange rate \(EUR/USD\).
= dt.datetime(2005, 1, 1)
start = dt.datetime.today()
end = fred.get_series("DEXUSEU", start, end)
EURUSD
= plt.subplots(figsize=(18, 8))
fig, ax ="tomato", label="EUR/USD")
ax.plot(EURUSD, color
ax.legend()
ax.grid() plt.show()
durbin_watson_test(EURUSD)
3.095631850387786e-05
The \(dw\) statistic is literally \(0\), which means \(\rho=1\), this shouldn’t be a shocking result. Most of time sereis of assets price will show a strong first degree autocorrelation, this is why we like trading the trend or momentum.
However, \(dw\) test is mostly for testing residuals, here’s another example that demonstrating highly autocorrelated residuals.
= dt.datetime(2005, 1, 1)
start = dt.datetime.today()
end
= fred.get_series("A939RX0Q048SBEA", start, end)
real_GDP_perCap = fred.get_series("PRFI", start, end)
priv_resid_fixInv = pd.DataFrame(
df "real_GDP_perCap": real_GDP_perCap, "priv_resid_fixInv": priv_resid_fixInv}
{ )
= smf.ols(formula="real_GDP_perCap ~ priv_resid_fixInv", data=df)
model = model.fit()
results print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: real_GDP_perCap R-squared: 0.714
Model: OLS Adj. R-squared: 0.710
Method: Least Squares F-statistic: 189.5
Date: Sun, 27 Oct 2024 Prob (F-statistic): 2.46e-22
Time: 17:33:27 Log-Likelihood: -718.89
No. Observations: 78 AIC: 1442.
Df Residuals: 76 BIC: 1446.
Df Model: 1
Covariance Type: nonrobust
=====================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------
Intercept 4.786e+04 839.264 57.024 0.000 4.62e+04 4.95e+04
priv_resid_fixInv 14.9047 1.083 13.764 0.000 12.748 17.061
==============================================================================
Omnibus: 24.755 Durbin-Watson: 0.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 34.570
Skew: -1.420 Prob(JB): 3.11e-08
Kurtosis: 4.603 Cond. No. 2.33e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.33e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
The drawback of \(dw\) test is that it assumes nonstachastic regressors (or independent variables we call in previous contexts), i.e. the regressors are fixed in repeated sampling. Then this drawback would invalidate the power of \(dw\) test in time series modelling.
Breusch–Godfrey Test
The Breusch–Godfrey Test is more general than Durbin-Watson that it allows for higher autoregression schemes and higher order of moving averages of white noises.
Suppose disturbance terms are generated by an \(AR(3)\) process \[ u_{t}=\rho_1 u_{t-1}+\rho_2 u_{t-2}+\rho_3 u_{t-3}+\varepsilon_{t} \] and choose whatever \(\rho\)’s you like. \(\varepsilon_t\) follows classical linear regression assumption.
= 50
T = np.array([0.5, 0.45, 0.4])
u = np.array([0.5, 0.4, 0.2])
rho_array for i in range(T):
= np.dot(u[:3], rho_array) + 100 * np.random.randn()
u_new_element = np.append(u_new_element, u)
u
= plt.subplots(figsize=(18, 8))
fig, ax ="tomato")
ax.plot(u, color
ax.grid() plt.show()
= np.arange(1, len(u) + 1)
X = 2 + 3 * X + u
Y = 2 + 3 * X
PRF = pd.DataFrame([X, Y]).T
df = ["X", "Y"] df.columns
= smf.ols(formula="Y ~ X", data=df)
model = model.fit()
results print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.884
Model: OLS Adj. R-squared: 0.881
Method: Least Squares F-statistic: 386.9
Date: Sun, 27 Oct 2024 Prob (F-statistic): 1.81e-25
Time: 17:33:28 Log-Likelihood: -322.41
No. Observations: 53 AIC: 648.8
Df Residuals: 51 BIC: 652.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1151.6375 30.137 38.214 0.000 1091.136 1212.139
X -19.1031 0.971 -19.671 0.000 -21.053 -17.153
==============================================================================
Omnibus: 2.480 Durbin-Watson: 1.007
Prob(Omnibus): 0.289 Jarque-Bera (JB): 1.689
Skew: 0.413 Prob(JB): 0.430
Kurtosis: 3.285 Cond. No. 63.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Once you have residual, run auxiliary regression \[ \hat{u}_{t}=\alpha_{1}+\alpha_{2} X_{t}+\hat{\rho}_{1} \hat{u}_{t-1}+\hat{\rho}_{2} \hat{u}_{t-2}+\cdots+\hat{\rho}_{p} \hat{u}_{t-p}+\varepsilon_{t} \]
Use the coefficient of determinant to construct the statistic \[ (n-p) R^{2} \sim \chi_{p}^{2} \] where \(p\) is the number of lags, \(n\) is the number of observations.
Fortunately, statsmodels
has Breusch-Godfrey function, the first value is the test statistic and the second is the \(p\)-value.
import statsmodels.stats.diagnostic as dg
print(dg.acorr_breusch_godfrey(results, nlags=3))
(13.250270434963353, 0.004125485159271261, 5.333478473420609, 0.0029764733478423247)
Remedies For Autocorrelaton
If we know \(\rho\)’s, it would be easy to convert to a generalized difference equation form \[ \left(Y_{t}-\rho Y_{t-1}\right)=\beta_{1}(1-\rho)+\beta_{2}\left(X_{t}-\rho X_{t-1}\right)+\varepsilon_{t} \] such that disturbance term now is \(\varepsilon_t\), which satisfied classical OLS assumption.
However, we rarely know true value of \(\rho\)’s.
If you have no information of \(\rho\), the best guess is by check \(dw\) test, if \[ dw < R^2 \] We assume \(\rho =1\), then generalized difference equations becomes \[ Y_{t}- Y_{t-1}=\beta_{2}(X_{t}-X_{t-1})+\varepsilon_{t} \] or \[ \Delta Y_{t}=\Delta \beta_{2}X_{t}+\varepsilon_{t} \]
If you don’t want to guess or simply using first order difference, the best choice is to estimate \(\rho\) in model \[ e_{t}=\rho \cdot e_{t-1}+v_{t} \] where \(e\)’s are residuals.
Because \(\rho\)’s are estimates, we usually call these procedures the feasible generalized least squares (FGLS).
Instead of using FGLS, we have HAC (heteroscedasticity- and autocorrelation-consistent) standard errors (some textbooks call it Newey-West standard errors).
= smf.ols(formula="Y ~ X", data=df)
model = model.fit(cov_type="HAC", cov_kwds={"maxlags": 1})
results print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: 0.884
Model: OLS Adj. R-squared: 0.881
Method: Least Squares F-statistic: 181.1
Date: Sun, 27 Oct 2024 Prob (F-statistic): 2.07e-18
Time: 17:33:28 Log-Likelihood: -322.41
No. Observations: 53 AIC: 648.8
Df Residuals: 51 BIC: 652.8
Df Model: 1
Covariance Type: HAC
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1151.6375 49.975 23.044 0.000 1053.689 1249.586
X -19.1031 1.420 -13.457 0.000 -21.885 -16.321
==============================================================================
Omnibus: 2.480 Durbin-Watson: 1.007
Prob(Omnibus): 0.289 Jarque-Bera (JB): 1.689
Skew: 0.413 Prob(JB): 0.430
Kurtosis: 3.285 Cond. No. 63.0
==============================================================================
Notes:
[1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction
In small samples, FGLS and HAC are not better than OLS, by small samples, we mean a sample less than \(30\) observation.